fix(operator): targeted reindex after pg_resetwal via amcheck by passcod · Pull Request #68 · beyondessential/postgres-restore-operator

passcod · 2026-06-09T06:52:59Z

🤖

Summary

When the restore init script falls back to pg_resetwal -f (the operator's two-stage WAL-recovery-failure path), it touches a new /pgdata/needs-reindex-all marker. The main container's background-reindex hook picks up that marker and runs a targeted reindex via the amcheck contrib extension: scan every valid btree index, REINDEX only the ones that fail the structural check, plus a blind REINDEX of every non-btree index. All before the readiness probe lets traffic in.

Why not REINDEX DATABASE

A blind REINDEX DATABASE takes hours on the prod-sized DBs (known from prior experience — the prod indexes are many). Most pg_resetwal cases corrupt zero or a handful of indexes; rewriting everything is wasteful.

amcheck reads each btree index page and verifies its invariants — including the "unexpected zero page at block N" case that prompted this. Cost: ~index-size of disk reads. For a clean snapshot it finds nothing and reads only the indexes; for a corrupt one it identifies exactly which need rebuilding.

Caveats:

amcheck only covers btree. GIN / GiST / BRIN / hash indexes get a blind REINDEX (typically a small fraction of total index size; safer than skipping).
bt_index_check (light variant) takes only AccessShareLock and doesn't block writes.
bt_index_check isn't exhaustive — there are corruption shapes it misses. But it does catch zero-page issues, which is the reported failure mode.

Shape

postgres_single_or_resetwal (init script) now touches /pgdata/needs-reindex-all at both pg_resetwal invocation sites (detected-WAL-signature branch + last-resort branch).
Init script's stage selection treats either marker as "not yet ready".
Main container's background reindex hook:
- needs-reindex-all present → enable amcheck, scan btrees, REINDEX corrupt btrees + all non-btrees. Clears both markers (the locale-only set is a strict subset).
- else needs-reindex present → existing collation-only loop.
Readiness probe waits for both markers to be absent.
REINDEX CONCURRENTLY on PG ≥ 14, plain REINDEX on older.

Tests

Three new unit tests:

Every pg_resetwal -f "$PGDATA" invocation in the init script is paired with touch /pgdata/needs-reindex-all.
The runtime hook's needs-reindex-all branch enables amcheck, calls bt_index_check, collects non-btree indexes, runs targeted REINDEX, and explicitly does not use REINDEX DATABASE.
Readiness probe waits on both markers.

pg_resetwal bypasses WAL replay — it tells postgres to act as if the data dir is consistent without replaying anything. Any index update in flight at snapshot time can leave torn pages, which postgres later surfaces as 'unexpected zero page at block N' when a query happens to hit the corrupted index. Downstream users reported exactly this against a primary key, with the hint 'Please REINDEX it.' A blind REINDEX DATABASE takes hours on the prod-sized DBs (known from prior experience). Instead, use postgres's amcheck contrib extension to do a structural scan: read each valid btree index, verify its invariants, queue only the failing ones for REINDEX. Plus a blind REINDEX of every non-btree index (amcheck only covers btree, non-btree are usually a tiny fraction of total index size). Reuses the existing post-restore reindex infrastructure (the /pgdata/needs-reindex marker, the background hook in the postgres container startup, the readiness probe gate). Adds a second marker /pgdata/needs-reindex-all set by every pg_resetwal -f invocation in the operator's two-stage fallback path. The runtime hook handles both flags, with the -all branch supplanting the narrow locale-only branch when both are set.

…SE CONCURRENTLY The amcheck-driven smart pass introduced in #68 hits the same family of postgres-internal pathology that wedges other vanilla DDL on the prod dataset — bt_index_check itself burns 100% CPU forever on specific indexes (observed via pg_stat_activity: state=active, empty wait_event, query_start growing linearly with wall clock for 4+ minutes on a tiny system catalog index). The smart pass is unusable on this data. Two changes: 1. Drop bt_index_check; revert the needs-reindex-all branch to a blind REINDEX DATABASE per user DB, CONCURRENTLY on PG ≥ 12 so clients can keep using the old indexes during the rebuild and the atomic swap makes corruption disappear without downtime. REINDEX reads the heap and rebuilds the index from scratch — a different code path from amcheck (which reads corrupt index pages directly) — so it isn't subject to the same wedge. Slow on prod-sized DBs (hours) but makes progress. 2. Drop the readiness probe's gate on /pgdata/needs-reindex-all. With the reindex taking hours, gating readiness here trips the operator's 30-minute deployment_ready_timeout and the restore is marked Failed before postgres even has a chance to come up. The probe still gates on /pgdata/needs-reindex (locale-only, finishes in seconds) since that's small and proven. Trade-off: clients hitting a not-yet-reindexed corrupt index in the window between pod-Ready and reindex-complete get the explicit "unexpected zero page" error from postgres. With CONCURRENTLY the window is narrow (queries hit the old index until the atomic swap) and clients can retry. Strictly better than the alternative (permanently failed restore, replica stuck indefinitely on the previous Active). The pre-#68 behaviour for the locale-only path is unchanged.

passcod force-pushed the operator-reindex-after-resetwal branch from dfcf7d2 to 34876f1 Compare June 9, 2026 06:57

passcod changed the title ~~fix(operator): REINDEX DATABASE after pg_resetwal fallback~~ fix(operator): targeted reindex after pg_resetwal via amcheck Jun 9, 2026

passcod force-pushed the operator-reindex-after-resetwal branch from 34876f1 to fa961fd Compare June 9, 2026 07:00

passcod enabled auto-merge June 9, 2026 07:00

passcod merged commit d9c6e95 into main Jun 9, 2026
13 checks passed

passcod deleted the operator-reindex-after-resetwal branch June 9, 2026 07:10

passcod mentioned this pull request Jun 10, 2026

fix(operator): un-gate readiness on needs-reindex-all, REINDEX DATABASE CONCURRENTLY #69

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(operator): targeted reindex after pg_resetwal via amcheck#68

fix(operator): targeted reindex after pg_resetwal via amcheck#68
passcod merged 1 commit into
mainfrom
operator-reindex-after-resetwal

passcod commented Jun 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

passcod commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why not REINDEX DATABASE

Shape

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

passcod commented Jun 9, 2026 •

edited

Loading